Skip to content

CNTRLPLANE-3276: Add Azure endpoint access transition e2e test#8718

Draft
bryan-cox wants to merge 1 commit into
openshift:mainfrom
bryan-cox:CNTRLPLANE-3276
Draft

CNTRLPLANE-3276: Add Azure endpoint access transition e2e test#8718
bryan-cox wants to merge 1 commit into
openshift:mainfrom
bryan-cox:CNTRLPLANE-3276

Conversation

@bryan-cox

@bryan-cox bryan-cox commented Jun 11, 2026

Copy link
Copy Markdown
Member

What this PR does / why we need it:

Adds a v2 e2e lifecycle test that validates transitioning a HostedCluster between Private and PublicAndPrivate topology on Azure self-managed clusters. The test verifies:

  1. Private → PublicAndPrivate: Updates topology, waits for PublicEndpointExposed condition to become True with reason SharedIngressConfigured, confirms KAS service becomes LoadBalancer, validates PLS CRs persist, and checks API reachability.
  2. PublicAndPrivate → Private: Restores topology, waits for PublicEndpointExposed condition to become False with reason TopologyPrivate, confirms KAS service returns to ClusterIP, validates PLS CRs persist.
  3. Private connectivity verification: Confirms API server is reachable via the private path after restore.

The test runs on the existing private cluster variant (labeled self-managed-azure-private) after AzurePrivateTopologyTest completes. DeferCleanup restores Private topology if the test fails mid-transition.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/CNTRLPLANE-3276

Special notes for your reviewer:

  • No changes to lifecycle/azure.go — the private variant's LabelFilter already includes self-managed-azure-private
  • Uses e2eutil.ConditionPredicate[*hyperv1.HostedCluster] rather than a custom condition helper
  • DeferCleanup uses context.Background() intentionally — cleanup may run after testCtx.Context is cancelled on timeout
  • The sync.Once-cached guest client in GetHostedClusterClient() is safe across topology transitions because the private endpoint stays active in PublicAndPrivate mode

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Tests
    • Added a new Azure-only end-to-end HostedCluster test that runs in order for clusters starting with Azure Private topology, validating a Private → PublicAndPrivate → Private transition.
    • Verifies KAS external route behavior at each step (public route host populated; private route deleted or restored as expected).
    • Confirms Azure PrivateLink Service resources remain present with a non-empty alias throughout transitions.
    • Checks API reachability via namespace listings during the workflow and after restoration, and automatically restores the original Private topology on completion.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 11, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 11, 2026

Copy link
Copy Markdown

@bryan-cox: This pull request references CNTRLPLANE-3276 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Adds a v2 e2e lifecycle test that validates transitioning a HostedCluster between Private and PublicAndPrivate topology on Azure self-managed clusters. The test verifies:

  1. Private → PublicAndPrivate: Updates topology, waits for PublicEndpointExposed condition to become True with reason SharedIngressConfigured, confirms KAS service becomes LoadBalancer, validates PLS CRs persist, and checks API reachability.
  2. PublicAndPrivate → Private: Restores topology, waits for PublicEndpointExposed condition to become False with reason TopologyPrivate, confirms KAS service returns to ClusterIP, validates PLS CRs persist.
  3. Private connectivity verification: Confirms API server is reachable via the private path after restore.

The test runs on the existing private cluster variant (labeled self-managed-azure-private) after AzurePrivateTopologyTest completes. DeferCleanup restores Private topology if the test fails mid-transition.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/CNTRLPLANE-3276

Special notes for your reviewer:

  • No changes to lifecycle/azure.go — the private variant's LabelFilter already includes self-managed-azure-private
  • Uses e2eutil.ConditionPredicate[*hyperv1.HostedCluster] rather than a custom condition helper
  • DeferCleanup uses context.Background() intentionally — cleanup may run after testCtx.Context is cancelled on timeout
  • The sync.Once-cached guest client in GetHostedClusterClient() is safe across topology transitions because the private endpoint stays active in PublicAndPrivate mode

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 11, 2026
@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds a new e2e test, AzureEndpointAccessTransitionTest, that validates HostedCluster topology transitions on Azure. The test transitions a cluster from Private to PublicAndPrivate and back to Private, verifying that KAS external routes appear and are deleted appropriately for each topology state, AzurePrivateLinkService CRs persist with non-empty Status.PrivateLinkServiceAlias, and the hosted cluster API remains reachable throughout. Supporting helper predicates match service types, verify PLS alias presence, and poll API reachability via namespace listing. The test is registered immediately after AzurePrivateTopologyTest.

Suggested reviewers

  • cblecker
  • Nirshal
  • csrwng
🚥 Pre-merge checks | ✅ 11
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding an Azure endpoint access transition e2e test, which matches the core functionality added in the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names in AzureEndpointAccessTransitionTest are static descriptive strings without dynamic values, pod names, timestamps, UUIDs, node names, or IP addresses.
Test Structure And Quality ✅ Passed Test code follows all quality requirements: single responsibility per test, proper setup/cleanup via BeforeAll/DeferCleanup, timeouts on all cluster operations, meaningful assertion messages with f...
Topology-Aware Scheduling Compatibility ✅ Passed This PR adds only e2e test code (test/e2e/v2/tests/hosted_cluster_azure_test.go) with no deployment manifests, operator code, or controllers. The custom check's scope explicitly applies to these, m...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Test contains no hardcoded IPv4 addresses, IPv4-specific logic, or external connectivity requirements. All operations are cluster-internal Kubernetes API calls.
No-Weak-Crypto ✅ Passed The PR adds an e2e test for Azure topology transitions with no weak cryptography usage. No MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB algorithms, custom crypto implementations, or non-constant-time s...
Container-Privileges ✅ Passed PR adds a Go e2e test file with no container/K8s manifests or privilege configurations. Check is not applicable to test code.
No-Sensitive-Data-In-Logs ✅ Passed Logging statements in AzureEndpointAccessTransitionTest log only non-sensitive data: Route hostnames (DNS names) and Azure PLS aliases (service metadata). No passwords, tokens, API keys, or PII are...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/platform/azure PR/issue for Azure (AzurePlatform) platform area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Jun 11, 2026
@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e/v2/tests/hosted_cluster_azure_test.go (1)

270-280: 💤 Low value

Add WithInterval for polling consistency.

The condition check at lines 253-268 specifies WithInterval(15*time.Second), but this EventuallyObject call omits it. For consistent polling behavior across similar checks in this test, consider adding the interval.

Suggested fix
 			e2eutil.EventuallyObject(GinkgoTB(), ctx, "KAS service is LoadBalancer",
 				func(ctx context.Context) (*corev1.Service, error) {
 					svc := hcpmanifests.KubeAPIServerService(controlPlaneNamespace)
 					err := testCtx.MgmtClient.Get(ctx, crclient.ObjectKeyFromObject(svc), svc)
 					return svc, err
 				},
 				[]e2eutil.Predicate[*corev1.Service]{
 					serviceTypePredicate(corev1.ServiceTypeLoadBalancer),
 				},
 				e2eutil.WithTimeout(10*time.Minute),
+				e2eutil.WithInterval(15*time.Second),
 			)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_azure_test.go` around lines 270 - 280, Add a
consistent polling interval to the EventuallyObject call by including
e2eutil.WithInterval(15*time.Second) alongside the existing e2eutil.WithTimeout
option; locate the call to e2eutil.EventuallyObject (the block returning
*corev1.Service and using serviceTypePredicate) and append the WithInterval
option to its variadic options so it uses the same 15s polling cadence as the
earlier check.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e/v2/tests/hosted_cluster_azure_test.go`:
- Around line 270-280: Add a consistent polling interval to the EventuallyObject
call by including e2eutil.WithInterval(15*time.Second) alongside the existing
e2eutil.WithTimeout option; locate the call to e2eutil.EventuallyObject (the
block returning *corev1.Service and using serviceTypePredicate) and append the
WithInterval option to its variadic options so it uses the same 15s polling
cadence as the earlier check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3b2222e8-a730-4864-a308-b08be14365df

📥 Commits

Reviewing files that changed from the base of the PR and between 35c0190 and 8a9c92f.

📒 Files selected for processing (1)
  • test/e2e/v2/tests/hosted_cluster_azure_test.go

@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.79%. Comparing base (392fd5a) to head (821ed0c).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8718      +/-   ##
==========================================
+ Coverage   41.75%   41.79%   +0.04%     
==========================================
  Files         758      759       +1     
  Lines       93981    94037      +56     
==========================================
+ Hits        39240    39304      +64     
+ Misses      51988    51983       -5     
+ Partials     2753     2750       -3     

see 5 files with indirect coverage changes

Flag Coverage Δ
cmd-support 35.11% <ø> (+0.09%) ⬆️
cpo-hostedcontrolplane 44.10% <ø> (ø)
cpo-other 43.45% <ø> (ø)
hypershift-operator 51.87% <ø> (+0.04%) ⬆️
other 31.56% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
test/e2e/v2/tests/hosted_cluster_azure_test.go (2)

279-309: ⚡ Quick win

Consider verifying HostedCluster condition PublicEndpointExposed returns to False.

The PR description states the test should "wait for PublicEndpointExposed to become False with reason TopologyPrivate" when transitioning back to Private. Currently, the test only verifies the KAS service type returns to ClusterIP. Adding a condition check would ensure the full reconciliation completed.

📋 Suggested addition to verify condition

After line 297, add:

 		e2eutil.EventuallyObject(GinkgoTB(), ctx, "KAS service is ClusterIP",
 			func(ctx context.Context) (*corev1.Service, error) {
 				svc := hcpmanifests.KubeAPIServerService(controlPlaneNamespace)
 				err := testCtx.MgmtClient.Get(ctx, crclient.ObjectKeyFromObject(svc), svc)
 				return svc, err
 			},
 			[]e2eutil.Predicate[*corev1.Service]{
 				serviceTypePredicate(corev1.ServiceTypeClusterIP),
 			},
 			e2eutil.WithTimeout(10*time.Minute),
 		)
+
+		e2eutil.EventuallyObject(GinkgoTB(), ctx, "HostedCluster PublicEndpointExposed condition is False",
+			func(ctx context.Context) (*hyperv1.HostedCluster, error) {
+				latest := &hyperv1.HostedCluster{}
+				err := testCtx.MgmtClient.Get(ctx, crclient.ObjectKeyFromObject(hc), latest)
+				return latest, err
+			},
+			[]e2eutil.Predicate[*hyperv1.HostedCluster]{
+				e2eutil.ConditionPredicate[*hyperv1.HostedCluster](
+					hyperv1.PublicEndpointExposed,
+					metav1.ConditionFalse,
+					hyperv1.TopologyPrivateReason,
+				),
+			},
+			e2eutil.WithTimeout(10*time.Minute),
+		)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_azure_test.go` around lines 279 - 309, Add
an assertion that the HostedCluster's PublicEndpointExposed condition flips to
False with reason TopologyPrivate after topology revert: use
e2eutil.EventuallyObject (like the existing KAS service check) to fetch the
HostedCluster (variable hc via testCtx.MgmtClient) and poll until
conditions.Get(hc.Status.Conditions, hyperv1.PublicEndpointExposed).Status ==
corev1.ConditionFalse and .Reason == "TopologyPrivate" (or use the helper that
checks condition values), with an appropriate timeout placed after the KAS
service ClusterIP check to ensure full reconciliation completed.

245-277: ⚡ Quick win

Consider verifying HostedCluster condition PublicEndpointExposed.

The PR description states the test should "wait for HostedCluster condition PublicEndpointExposed to become True with reason SharedIngressConfigured." Currently, the test only verifies the KAS service type changes to LoadBalancer. Adding a condition check would provide stronger validation that the control plane reconciliation loop completed successfully.

📋 Suggested addition to verify condition

After line 263, add a condition check using e2eutil.ConditionPredicate:

 		e2eutil.EventuallyObject(GinkgoTB(), ctx, "KAS service is LoadBalancer",
 			func(ctx context.Context) (*corev1.Service, error) {
 				svc := hcpmanifests.KubeAPIServerService(controlPlaneNamespace)
 				err := testCtx.MgmtClient.Get(ctx, crclient.ObjectKeyFromObject(svc), svc)
 				return svc, err
 			},
 			[]e2eutil.Predicate[*corev1.Service]{
 				serviceTypePredicate(corev1.ServiceTypeLoadBalancer),
 			},
 			e2eutil.WithTimeout(10*time.Minute),
 		)
+
+		e2eutil.EventuallyObject(GinkgoTB(), ctx, "HostedCluster PublicEndpointExposed condition is True",
+			func(ctx context.Context) (*hyperv1.HostedCluster, error) {
+				latest := &hyperv1.HostedCluster{}
+				err := testCtx.MgmtClient.Get(ctx, crclient.ObjectKeyFromObject(hc), latest)
+				return latest, err
+			},
+			[]e2eutil.Predicate[*hyperv1.HostedCluster]{
+				e2eutil.ConditionPredicate[*hyperv1.HostedCluster](
+					hyperv1.PublicEndpointExposed,
+					metav1.ConditionTrue,
+					hyperv1.SharedIngressConfiguredReason,
+				),
+			},
+			e2eutil.WithTimeout(10*time.Minute),
+		)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_azure_test.go` around lines 245 - 277, Add a
check that waits for the HostedCluster condition "PublicEndpointExposed" to be
True with reason "SharedIngressConfigured" (using the existing HostedCluster
object hc and testCtx.MgmtClient) to ensure control-plane reconciliation
completed; implement this by calling e2eutil.EventuallyObject (or the existing
helper that waits on conditions) with e2eutil.ConditionPredicate for condition
"PublicEndpointExposed" and reason "SharedIngressConfigured" (use
e2eutil.ConditionPredicate(hc, "PublicEndpointExposed", corev1.ConditionTrue,
"SharedIngressConfigured") or the appropriate signature) placed after the KAS
service LoadBalancer check/PLS verification and before verifyAPIReachable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e/v2/tests/hosted_cluster_azure_test.go`:
- Around line 279-309: Add an assertion that the HostedCluster's
PublicEndpointExposed condition flips to False with reason TopologyPrivate after
topology revert: use e2eutil.EventuallyObject (like the existing KAS service
check) to fetch the HostedCluster (variable hc via testCtx.MgmtClient) and poll
until conditions.Get(hc.Status.Conditions, hyperv1.PublicEndpointExposed).Status
== corev1.ConditionFalse and .Reason == "TopologyPrivate" (or use the helper
that checks condition values), with an appropriate timeout placed after the KAS
service ClusterIP check to ensure full reconciliation completed.
- Around line 245-277: Add a check that waits for the HostedCluster condition
"PublicEndpointExposed" to be True with reason "SharedIngressConfigured" (using
the existing HostedCluster object hc and testCtx.MgmtClient) to ensure
control-plane reconciliation completed; implement this by calling
e2eutil.EventuallyObject (or the existing helper that waits on conditions) with
e2eutil.ConditionPredicate for condition "PublicEndpointExposed" and reason
"SharedIngressConfigured" (use e2eutil.ConditionPredicate(hc,
"PublicEndpointExposed", corev1.ConditionTrue, "SharedIngressConfigured") or the
appropriate signature) placed after the KAS service LoadBalancer check/PLS
verification and before verifyAPIReachable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 9e7d7d98-d5d6-4677-8778-67d0d9a0b9a5

📥 Commits

Reviewing files that changed from the base of the PR and between 8a9c92f and 13e0579.

📒 Files selected for processing (1)
  • test/e2e/v2/tests/hosted_cluster_azure_test.go

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/e2e/v2/tests/hosted_cluster_azure_test.go (1)

246-314: ⚡ Quick win

Add labels to each new It for filterability consistency.

The new specs are under a labeled Context, but the It blocks themselves are unlabeled.

Proposed fix
-		It("should transition from Private to PublicAndPrivate", func() {
+		It("should transition from Private to PublicAndPrivate", Label("azure-endpoint-transition"), func() {
...
-		It("should transition from PublicAndPrivate back to Private", func() {
+		It("should transition from PublicAndPrivate back to Private", Label("azure-endpoint-transition"), func() {
...
-		It("should reach the API server after restoring Private topology", func() {
+		It("should reach the API server after restoring Private topology", Label("azure-endpoint-transition"), func() {

As per coding guidelines, apply labels to both Describe and It blocks for test filtering.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_azure_test.go` around lines 246 - 314, Add
labels to the three It blocks in the test to maintain filtering consistency with
the parent Context block. The It blocks "should transition from Private to
PublicAndPrivate", "should transition from PublicAndPrivate back to Private",
and "should reach the API server after restoring Private topology" need to be
labeled using the Label() function according to coding guidelines for test
filtering. Apply appropriate labels that categorize these topology transition
tests consistently.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_azure_test.go`:
- Around line 295-297: The assertion for the service type check is too loose and
only verifies that the type is not LoadBalancer, allowing false positives from
other unintended types like NodePort or ExternalName. Replace the
NotTo(Equal(corev1.ServiceTypeLoadBalancer)) check with explicit assertions that
verify the service is in one of the expected states after restore: either the
service should not be found (NotFound error) or it should have type ClusterIP.
This ensures the assertion catches actual regressions instead of passing for any
non-LoadBalancer type.
- Around line 236-243: The DeferCleanup function currently logs a warning when
the topology restore fails (when restoreErr is not nil) but continues execution,
which can leave the cluster state mutated and cause cascading failures in
subsequent tests. Modify the error handling block where restoreErr is checked to
fail the test using an appropriate Ginkgo assertion or failure method (such as
GinkgoTB().Fail) instead of just logging a warning, ensuring that any failure to
restore the Azure topology causes the test cleanup to fail and prevent state
corruption from affecting other specs.

---

Nitpick comments:
In `@test/e2e/v2/tests/hosted_cluster_azure_test.go`:
- Around line 246-314: Add labels to the three It blocks in the test to maintain
filtering consistency with the parent Context block. The It blocks "should
transition from Private to PublicAndPrivate", "should transition from
PublicAndPrivate back to Private", and "should reach the API server after
restoring Private topology" need to be labeled using the Label() function
according to coding guidelines for test filtering. Apply appropriate labels that
categorize these topology transition tests consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 82fff169-72ca-4e38-910e-d973910bb5f5

📥 Commits

Reviewing files that changed from the base of the PR and between 13e0579 and 57c8e8e.

📒 Files selected for processing (1)
  • test/e2e/v2/tests/hosted_cluster_azure_test.go

Comment on lines +236 to +243
DeferCleanup(func() {
restoreErr := e2eutil.UpdateObject(GinkgoTB(), context.Background(), testCtx.MgmtClient, hc, func(obj *hyperv1.HostedCluster) {
obj.Spec.Platform.Azure.Topology = hyperv1.AzureTopologyPrivate
})
if restoreErr != nil {
GinkgoTB().Logf("WARNING: failed to restore Private topology: %v", restoreErr)
}
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail cleanup if topology restore does not succeed.

The cleanup currently logs and continues on restore failure, which can leave shared cluster state mutated and cause cascading failures in later specs.

Proposed fix
 			DeferCleanup(func() {
-				restoreErr := e2eutil.UpdateObject(GinkgoTB(), context.Background(), testCtx.MgmtClient, hc, func(obj *hyperv1.HostedCluster) {
+				cleanupCtx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
+				defer cancel()
+				restoreErr := e2eutil.UpdateObject(GinkgoTB(), cleanupCtx, testCtx.MgmtClient, hc, func(obj *hyperv1.HostedCluster) {
 					obj.Spec.Platform.Azure.Topology = hyperv1.AzureTopologyPrivate
 				})
-				if restoreErr != nil {
-					GinkgoTB().Logf("WARNING: failed to restore Private topology: %v", restoreErr)
-				}
+				Expect(restoreErr).NotTo(HaveOccurred(), "cleanup: failed to restore Private topology")
 			})

As per coding guidelines, when mutating cluster state, restore it via DeferCleanup in a fail-safe way on all exit paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_azure_test.go` around lines 236 - 243, The
DeferCleanup function currently logs a warning when the topology restore fails
(when restoreErr is not nil) but continues execution, which can leave the
cluster state mutated and cause cascading failures in subsequent tests. Modify
the error handling block where restoreErr is checked to fail the test using an
appropriate Ginkgo assertion or failure method (such as GinkgoTB().Fail) instead
of just logging a warning, ensuring that any failure to restore the Azure
topology causes the test cleanup to fail and prevent state corruption from
affecting other specs.

Source: Coding guidelines

Comment thread test/e2e/v2/tests/hosted_cluster_azure_test.go Outdated
@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

1 similar comment
@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test pull-ci-openshift-hypershift-main-e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

1 similar comment
@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@hypershift-jira-solve-ci

hypershift-jira-solve-ci Bot commented Jun 17, 2026

Copy link
Copy Markdown

The PR only modifies a single test file, confirming this failure is entirely unrelated to the code changes. Here is the complete analysis:

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

golang.org/x/[email protected]: read "https://proxy.golang.org/golang.org/x/text/@v/v0.36.0.zip": stream error: stream ID 73; INTERNAL_ERROR; received from peer
make: *** [Makefile:542: deps] Error 1
##[error]Process completed with exit code 2.

Summary

The verify job failed during dependency resolution (make deps at Makefile:542), not during compilation, linting, or testing. The Go module proxy (proxy.golang.org) returned an HTTP/2 stream error (INTERNAL_ERROR) while downloading golang.org/x/[email protected], causing 13 sub-package import failures across the hack/tools module. This is a transient infrastructure issue on the Go module proxy side, completely unrelated to the PR's code changes (which only modify test/e2e/v2/tests/hosted_cluster_azure_test.go).

Root Cause

The root cause is a transient HTTP/2 stream error from the Go module proxy (proxy.golang.org).

During the make generate update step, the Makefile invokes the deps target (line 542) which runs go mod tidy && go mod vendor && go mod verify inside the hack/tools directory. This requires downloading dependencies from the Go module proxy.

The download of golang.org/x/[email protected] failed with:

stream error: stream ID 73; INTERNAL_ERROR; received from peer

This is an HTTP/2 protocol-level error originating from the proxy server — the server's HTTP/2 implementation sent an INTERNAL_ERROR frame on stream 73, aborting the transfer. This caused 13 sub-packages of golang.org/x/text to fail resolution simultaneously (since they all come from the same zip archive): runes, transform, unicode/norm, language, message, cases, encoding, encoding/charmap, encoding/htmlindex, width, encoding/unicode, secure/bidirule, and unicode/bidi.

The go mod tidy command treats any unresolvable import as a fatal error, which propagated up as make: *** [Makefile:542: deps] Error 1.

This failure is 100% infrastructure-related — the PR only changes test/e2e/v2/tests/hosted_cluster_azure_test.go and does not modify any go.mod, go.sum, vendor files, Makefile, or dependency-related files.

Recommendations
  1. Re-run the job — This is a transient Go module proxy error. A retry should succeed. Use the GitHub Actions UI "Re-run failed jobs" button or push a no-op commit.

  2. No code changes needed — The PR (single test file addition) has no relation to the dependency download failure.

  3. If the failure persists on retry, it may indicate a broader outage at proxy.golang.org. In that case:

    • Check Go module proxy status and Google Cloud Status
    • Consider setting GONOSUMCHECK or GOFLAGS=-goflags=-insecure temporarily (not recommended for production)
    • The hack/tools module uses vendoring (-mod=vendor), but go mod tidy still requires proxy access to resolve and verify the full dependency graph
Evidence
Evidence Detail
Failed make target make: *** [Makefile:542: deps] Error 1
Failing command cd hack/tools && go mod tidy && go mod vendor && go mod verify && go list -m -mod=readonly -json all > /dev/null
Error type HTTP/2 stream error: INTERNAL_ERROR; received from peer
Failing module golang.org/x/[email protected] from proxy.golang.org
Affected sub-packages 13 packages (runes, transform, unicode/norm, language, message, cases, encoding, encoding/charmap, encoding/htmlindex, width, encoding/unicode, secure/bidirule, unicode/bidi)
PR files changed Only test/e2e/v2/tests/hosted_cluster_azure_test.go — no dependency files touched
Job duration ~2 min (13:39:40–13:41:36 UTC) — failed early during dep resolution, never reached verify/lint
Run attempt 1 (no retries attempted)

Add a v2 lifecycle test that validates transitioning a HostedCluster
from Private to PublicAndPrivate topology and back on Azure
self-managed clusters. The test runs on the existing private cluster
variant after AzurePrivateTopologyTest completes.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/platform/azure PR/issue for Azure (AzurePlatform) platform area/testing Indicates the PR includes changes for e2e testing do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants